Detecting Data Errors with Employing Negative Association Rules
نویسندگان
چکیده
Ever increasing amount of data has led to the fact that the data quality has had a crucial impact on almost any IT applications. Nonetheless, data quality issues are nearly omnipresent. Recently, a great deal of researches has focused on improving data quality. Bad quality of the data can cause incorrect decision making. It is normally infeasible to guarantee sufficient data quality through manual inspection. Therefore (semi-)automatic data cleansing methods have to be employed. Data mining methods are ideal for this purpose, since they are aimed at finding abnormal patterns in large volumes of data. In this paper, we show the utility of negative association rules for detecting dubious data. The evaluations show that employing negative association rules provide interesting results for identifying incorrect data entries.
منابع مشابه
Employing data mining to explore association rules in drug addicts
Drug addiction is a major social, economic, and hygienic challenge that impacts on all the community and needs serious threat. Available treatments are successful only in short-term unless underlying reasons making individuals prone to the phenomenon are not investigated. Nowadays, there are some treatment centers which have comprehensive information about addicted people. Therefore, given the ...
متن کاملDesign and Implementation of a Software System for Detecting Orthographical or Morphological Errors in Persian Words
This paper presents a new method for analyzing words in the Persian language context to find orthographical and structural errors regardless of the meaning. This technique tokenizes each word in a statement then tries to detect the kind of word, and analyses its correctness in terms of orthography and morphology by means of a lexicon. It should be noted that some words in the Persian language h...
متن کاملChi-Square Test for Anomaly Detection in XML Documents Using Negative Association Rules
Anomaly detection is the double purpose of discovering interesting exceptions and identifying incorrect data in huge amounts of data. Since anomalies are rare events, which violate the frequent relationships among data. Normally anomaly detection builds models of normal behavior and automatically detects significant deviations from it. The proposed system detects the anomalies in nested XML doc...
متن کاملIntroducing an algorithm for use to hide sensitive association rules through perturb technique
Due to the rapid growth of data mining technology, obtaining private data on users through this technology becomes easier. Association Rules Mining is one of the data mining techniques to extract useful patterns in the form of association rules. One of the main problems in applying this technique on databases is the disclosure of sensitive data by endangering security and privacy. Hiding the as...
متن کاملControlling False Positives in Association Rule Mining
Association rule mining is an important problem in the data mining area. It enumerates and tests a large number of rules on a dataset and outputs rules that satisfy user-specified constraints. Due to the large number of rules being tested, rules that do not represent real systematic effect in the data can satisfy the given constraints purely by random chance. Hence association rule mining often...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JDCTA
دوره 3 شماره
صفحات -
تاریخ انتشار 2009